Project thumbnail image

Predicting Student Admission to University

Python Machine Learning Classification Logistic Regression Streamlit

Project Overview

This project predicts the likelihood of a student being admitted to a university based on key academic and personal attributes. The model assists applicants in assessing their admission chances and provides universities with a data-driven approach to evaluating candidates.

Key Insights

  • GRE, TOEFL, CGPA, and research experience strongly influence admission probability.
  • University rating, SOP, and LOR have a moderate impact on admission chances.
  • Logistic Regression was chosen for its efficiency, interpretability, and suitability for binary classification.
  • Feature scaling (StandardScaler) improved model performance and ensured consistency.
  • Achieved 94% accuracy, with an F1-score of 91%, ensuring a balance between precision and recall.
  • ROC AUC score of 0.94, demonstrating strong model performance in distinguishing admission outcomes.
  • Deployed an interactive Streamlit web app for real-time user predictions.

Technical Implementation

  • Data Preprocessing:
    • Checked for missing values and handled inconsistencies.
    • Scaled numerical features using StandardScaler for better model performance.
  • Model Selection:
    • Implemented Logistic Regression, a reliable algorithm for binary classification.
    • Used GridSearchCV for hyperparameter tuning to optimize performance.
  • Model Evaluation:
    • Measured accuracy, precision, recall, and F1-score for performance assessment.
    • Generated a confusion matrix and ROC curve to evaluate classification performance.
  • Deployment:
    • Developed a Streamlit web app for real-time predictions.
    • Packaged the model using Pickle for efficient loading and inference.

Live Preview

Loading preview...

Video Preview

Key Learnings

  • Feature engineering is crucial: Selecting and scaling the right features significantly improves model performance.
  • Logistic Regression works well for binary classification with interpretable results.
  • Model evaluation should go beyond accuracy: Precision, recall, and F1-score provide deeper insights into performance.
  • Pipeline automation enhances efficiency and makes the workflow more reproducible.
  • Deploying ML models using Streamlit allows users to interact with predictions easily.
  • Real-world applications require continuous improvement, such as expanding the dataset or testing other models for further enhancement.
Previous Project All Projects Next Project